<!DOCTYPE html>
<!--[if IE 8]>
<html class="no-js lt-ie9" lang="en"> <![endif]-->
<!--[if gt IE 8]><!-->
<html class="no-js" lang="en"> <!--<![endif]-->
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Multi-modal Data Processing for Foundation Models: Practical
        Guidances and Use Cases</title>
    <!-- Bootstrap -->
    <link rel="stylesheet"
          href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css"
          integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm"
          crossorigin="anonymous">
    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js"
            integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl"
            crossorigin="anonymous"></script>
</head>

<body>
<nav class="navbar navbar-expand-lg navbar-light bg-light">
    <a class="navbar-brand" href="#">KDD 2024 Hands-on Tutorial</a>
    <button class="navbar-toggler" type="button" data-toggle="collapse"
            data-target="#navbarSupportedContent"
            aria-controls="navbarSupportedContent" aria-expanded="false"
            aria-label="Toggle navigation"><span
            class="navbar-toggler-icon"></span></button>
    <div class="collapse navbar-collapse" id="navbarSupportedContent">
        <ul class="navbar-nav mr-auto">
            <li class="nav-item active"><a class="nav-link" href="#">Home <span
                    class="sr-only">(current)</span></a></li>
            <li class="nav-item"><a class="nav-link"
                                    href="#Schedule">Schedule</a></li>
            <li class="nav-item"><a class="nav-link" href="#Organizers">Organizers</a>
            </li>
        </ul>
    </div>
</nav>
<header>
    <div class="jumbotron">
        <div class="container">
            <div class="row">
                <div class="col-10 col-lg-12">
                    <h2 class="text-center">KDD 2024 Hands-on Tutorial</h2>
                    <h1 class="text-center"><strong>Multi-modal Data Processing
                        for Foundation Models: Practical Guidances and Use
                        Cases</strong></h1>
                    <h4 class="text-center"><em><strong>Date &
                        Time</strong></em>: 10:00 AM - 1:00 PM, August 25, 2024
                    </h4>
                    <h4 class="text-center"><em><b>Location</b></em>: Room 124-125, Centre de
                        Convencions Internacional de Barcelona </h4>
                </div>
            </div>
        </div>
        In the foundation models era, efficiently processing multi-modal data
        is crucial.
        This tutorial covers key techniques for multi-modal data processing and
        introduces the open-source Data-Juicer system, designed to tackle the
        complexities of data variety, quality, and scale.
        Participants will learn how to use Data-Juicer's operators and tools
        for formatting, mapping, filtering, deduplicating, and selecting
        multi-modal data efficiently and effectively.
        They will also be familiar with the Data-Juicer Sandbox Lab, where
        users can easily experiment with diverse data recipes that represent
        methodical sequences of operators and streamline the creation of
        scalable data processing pipelines.
        This experience solidifies the concepts discussed, as well as provides
        a space for innovation and exploration, highlighting how data recipes
        can be optimized and deployed in high-performance distributed
        environments.
        <p></p>By the end of this tutorial, attendees will be equipped with the
        practical knowledge and skills to navigate the multi-modal data
        processing for foundation models. They will leave with actionable
        knowledge with an industrial open-source system and an enriched
        perspective on the importance of high-quality data in AI, poised to
        implement sustainable and scalable solutions in their projects.
        <p>The system and related materials are available at
            <a href="https://github.com/datajuicer/data-juicer">https://github.com/datajuicer/data-juicer</a>.
        </p></div>


</header>
<section>
    <div class="container">
        <div class="row">
            <div class="col-12 mb-2 text-center">
                <h2><a id="Schedule">Schedule</a></h2>
            </div>
        </div>
        <div class="row">
            <div class="col-sm-6 col-lg-12" style="margin-bottom: 3em;">
                <h6 class="text-left"><b>Date</b>:10:00 AM - 1:00 PM, August 25, 2024</h6>
                <h6 class="text-left"><b>Location</b>: Room 124-125, Centre de Convencions Internacional de Barcelona</h6>
                <p></p>
                <h6 class="text-left">(20 min) | Introduction and Overview:
                    Multi-modal Data Processing and the
                    Data-Juicer System</h6>
                <h6 class="text-left">(20 min) | Building Blocks of Data
                    Processing: Data-Juicer’s Operators</h6>
                <h6 class="text-left">(20 min) | Composing Atomic Capabilities:
                    Data-Juicer’s Data Recipes</h6>
                <h6 class="text-left">(30 min) | Exploring Data Recipes: The
                    Data-Juicer Sandbox Lab</h6>
                <h6 class="text-left">(30 min) | From Exploration to
                    Production: High-Performance Data Factory</h6>
                <h6 class="text-left">(45 min) | Use Cases: From Text to Video
                    Data Processing</h6>
                <h6 class="text-left">(15 min) | Resources and Conclusion</h6>
            </div>
        </div>
        <div class="row">
            <div class="col-sm-6 col-lg-12">
                <p><a href="https://github.com/datajuicer/data-juicer/tree/feat/kdd24/tutorials/slides.pdf" target="_blank"><b>Tutorial slides
                    (pdf)</b></a></p>
                <p><a href="https://github.com/datajuicer/data-juicer/tree/feat/kdd24/tutorials/notebooks" target="_blank"><b>Tutorial Jupyter nootbooks
                    </b></a></p>
                <p>
                    <a href="https://github.com/datajuicer/data-juicer/blob/main/docs/awesome_llm_data.md"
                       target="_blank"><b>Related awesome list</b></a></p>
            </div>
        </div>
    </div>
    <div class="container">

        <div class="row">
            <div class="col-lg-12 mb-4 mt-2 text-center">
                <h2><a id="Organizers">Organizers</a></h2>
            </div>
        </div>
        <div class="row">
            <div class="col-lg-12 mb-4 mt-2 text-center">
                <h5>We are the <a
                        href="https://github.com/datajuicer/data-juicer?tab=readme-ov-file#references"
                        target="_blank">Data-Juicer</a> team from Alibaba
                    Tongyi</h5>
                <img src="https://img.alicdn.com/imgextra/i3/O1CN017Eq5kf27AlA2NUKef_!!6000000007757-0-tps-1280-720.jpg"
                     width="640" height="360" alt="Data-Juicer"/>
            </div>
        </div>

        <div class="row">
            <div class="col-lg-12 mb-4 mt-2 text-center">
                <h4><a id="Speakers">On-site Speakers</a></h4>
            </div>
        </div>

        <div class="container">
            <div class="row justify-content-center">
                <div class="col-lg-2 mb-4 mt-2 text-center">
                    <img alt="140x140" class="rounded-circle"
                         style="width: 140px; height: 140px;"
                         src="https://img.alicdn.com/imgextra/i4/O1CN01wvOXAa231p3nkuMAL_!!6000000007196-2-tps-435-435.png"
                         data-holder-rendered="true">
                    <h4><a href="https://yxdyc.github.io/"
                           target="_blank" style="white-space: nowrap;">Daoyuan
                        Chen</a></h4>
                </div>
                <div class="col-lg-2 mb-4 mt-2 text-center">
                    <img alt="140x140" class="rounded-circle"
                         style="width: 140px; height: 140px;"
                         src="https://img.alicdn.com/imgextra/i2/O1CN01xdrwpu1gJvYOjAV1X_!!6000000004122-2-tps-435-435.png"
                         data-holder-rendered="true">
                    <h4><a href="https://sites.google.com/site/yaliangli/"
                           target="_blank" style="white-space: nowrap;">Yaliang
                        Li</a></h4>
                </div>
                <div class="col-lg-2 mb-4 mt-2 text-center">
                    <img alt="140x140" class="rounded-circle"
                         style="width: 140px; height: 140px;"
                         src="https://img.alicdn.com/imgextra/i3/O1CN01a2vrTs1LEKmoWd6NA_!!6000000001267-2-tps-435-435.png"
                         data-holder-rendered="true">
                    <h4><a href="https://bolinding.github.io/"
                           target="_blank" style="white-space: nowrap;">Bolin
                        Ding</a></h4>
                </div>
            </div>
        </div>
    </div>


</section>
<div class="container"></div>
<footer class="text-center">
    <div class="container">
        <div class="row">
            <div class="col-12"></div>
        </div>
    </div>
</footer>
<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script src="js/jquery-3.2.1.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="js/popper.min.js"></script>
<script src="js/bootstrap-4.0.0.js"></script>
</body>
</html>